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Abstract: Let X,X\,...,Xn be i.i.d. Gaussian random variables in a 
separable Hilbert space H with zero mean and covariance operator S = 
E(A' (g) A), and let S := n~^ ^ ^j) the sample (empirical) 

covariance operator based on (Xi,..., Xn)- Denote by Pr the spectral pro¬ 
jector of S corresponding to its r-th eigenvalue fir and by Pr the empirical 
counterpart of Pr. The main goal of the paper is to obtain tight bounds on 


sup 


\\Pr - PrWl -n\Pr - PrWl 

Varl/2(||P^-P^||2) 



4>(x) 


where || • ||2 denotes the Hilbert-Schmidt norm and $ is the standard nor¬ 
mal distribution function. Such accuracy of normal approximation of the 
distribution of squared Hilbert-Schmidt error is characterized in terms of 
so called effective rank of E defined as r(S) = where tr(E) is the 

trace of S and ||E||oo is its operator norm, as well as another parame¬ 
ter characterizing the size of Var(||Pr — PrW^)- Other results include non- 
asymptotic bounds and asymptotic representations for the mean squared 
Hilbert-Schmidt norm error E|| A- — TVII 2 and the variance Var(|| A —-fV III)? 
and concentration inequalities for ||A — ^rlll around its expectation. 
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1. Introduction 

Let X be a mean zero Gaussian random vector in a separable Hilbert space 
H with covariance operator E = E(X 0 X) and let Xi, ... ,X^ be a sample 
of n i.i.d. copies of X. The sample covariance operator S = is defined as 
follows: E := E^ := n~^ G X^). Denote by ftr the r-th eigenvalue of 

E (in a decreasing order) and by Pr the corresponding spectral projector of 
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E (that is, the orthogonal projector on the eigenspace of eigenvalue Let 
Pr denote properly defined empirical counterpart of Pr (see Section 2.2 for a 
precise definition). The main goal of the paper is to obtain a tight bound on the 
accuracy of normal approximation of the distribution of the squared Hilbert- 
Schmidt norm error HA- — Trili of the estimator Pr- Another goal is to provide 
bounds on the risk E||Pr- — PrM of this estimator as well as non-asymptotic 
bounds on concentration of random variables UPr — Prlli around its expectation. 
These bounds will be expressed in terms of natural complexity parameters of 
the problem, the most important one being the so called effective rank r(S) that 
has been recently used in the literature (see [14], [2], [12]). 

Definition 1. The following quantity r(E) := called the effective 

rank of E. 

Here tr(E) denotes the trace of E and jlEjjoo denotes its operator norm. 
The above definition clearly implies that r(E) < rank(E). A recent result by 
Koltchinskii and Lounici, see [11], shows that, in the Gaussian case, the size of 
the operator norm error j]E — Ejjoo of sample covariance E is completely char¬ 
acterized by ||Ej|oo and r(E). This makes the effective rank r(E) the crucial 
complexity parameter of the problems of estimation of covariance and its spec¬ 
tral characteristics (its principal components) that allows one to study princi¬ 
pal component analysis (PCA) problems in a unified dimension-free framework 
that includes their high-dimensional and infinite-dimensional versions (func¬ 
tional PCA, kernel PCA, etc). As in the preceding paper [10], our goal is to 
study the problem in a “high-complexity setting”, where both the sample size 
n and the effective rank r(E) are large, although our primary focus is on the 
case when r(E) = o(n) which implies operator norm consistency of both E and 
Pr- This setting is much closer to high-dimensional covariance estimation and 
PCA problems than to standard results on PCA in Hilbert spaces with a fixed 
value of tr(E) (see, for instance, [4]) that are commonly used in the literature 
on functional PCA and kernel PCA. It includes, in particular, high-dimensional 
spiked covariance models (see [5], [6], [13]) in which 

m 

E = ( 1 . 1 ) 

i=i 

where {dj} is an orthonormal basis of H, ‘ are the variances 

of m independent components of the “signal”, is the variance of the noise 
components and Pp := ® dj) is the orthogonal projector on the linear 

span of the vectors ^i,... ,0p, where p > m. This models the covariance of a 
Gaussian signal with m independent components observed in an independent 
Gaussian white noise. It is usually assumed that the number of components 
m and the variances sf..., s^, cr^ are fixed, but the overall dimension of the 
problem p = > oo as n —> oo is large, implying that 

m 

tr(S) = -I- a^p ~ a^p —?► oo as n —?► oo 

1=1 
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and r(E) ~ Estimation of the components of the “signal” 9i,... ,6m is 

viewed as PCA for unknown covariance E. It is common to consider a sequence 
of high-dimensional problems in spaces W’jp = pn (rather than explicitly embed 
the spaces into an infinite dimensional Hilbert space H). To assess the per¬ 
formance of the PCA, the loss function L{a, b) := 2(1 — |(a, 6)|), where a,b gM.p 
are unit vectors, was used in [1], A closely related loss function is defined by 
L'{a, 6) := ||a 0 a — & 0 6||| = 2(1 — (a, &)^), see, for instance, [3, 12, 15]. In the 
case of spiked covariance model with cr^ = 1 and ^ 0 as n —>• oo, the following 

asymptotic representation of the risk holds, [Ij: 


EL(0j,6»j) 


(p- m)(l -ks|) 

4 

ns^ 


1 (l + s2)(l + 4) 

- 


(l-fo(l)),j = l,...,m. 


( 1 . 2 ) 

Under the assumption >'C>0asn —>-00 the classical PCA is known to 

yield inconsistent estimators of the eigenvectors, see, e.g., [6]. In [1], a thresh¬ 
olding procedure in spirit of diagonal thresholding of Johnstone and Lu [6] was 
proposed and it was proved that it achieves optimality in the minimax sense for 
the loss L(-, •) under sparsity conditions on the eigenvectors of E. 

In this paper, we are not making any structural assumptions on the covariance 
operator E, such as the spiked covariance model, sparsity, etc, but rather study 
the problem in terms of complexity parameter r(E). We derive representations 
of the Hilbert-Schmidt risk EjjPf. — Prili of empirical spectral projectors in the 
case when r(E) = o(n) that imply representation (1.2) for spiked covariance 
model. Specifically, we prove that 


E\\Pr-Pr\\l = {l+0{l))^^, (1.3) 

n 

where Ar(E) = 2tr(PrEPr)tr(CrECr) and the operator Cr is defined as Cr := 
12s^r addition, we show that 


Var(||A-U.||^) = (1 + 0(1))^^, 


(1.4) 


where Pr-(E) := 2 -\/ 2 ||Pf.EPr|| 2 ||C'rEC'r|| 2 , and derive concentration bounds for 
random variable ||P,. — Prill around its expectation. One of the main results of 
the paper is the following bound on the accuracy of normal approximation of 
random variable ||Pr — Prill that holds under rather mild assumptions: 


sup 


|Pr- Prill -E||Pr-Pr 


Varl/2(||A- Prill) 


< a: > — <i)(a:) 


< C 


1 

Pr(E) 



(1.5) 
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where ^{x) denotes the standard normal distribution function. This bound im- 
plies that the distribution of random variable ^^ 1(|^^^ asymptoti¬ 
cally standard normal as soon as n —>• oo, Br(T,) —>■ oo and g ® which, 

in particular, implies that r(E) = o(n). 

Throughout the paper, for A,B>0, the notation A < B means that there 
exists an absolute constant C > 0 such that A < CB. Similarly, A > B means 
that A > CB for an absolute constant C > 0 and A^ B means that A< B and 
A > S. In the cases when the constant C in the above bounds might depend 
on some parameter(s), say, 7 , and we want to emphasize this dependence, we 
will write A <y B, A B, or A B. Also, throughout the paper (as 
it was already done in the introduction), || • II 2 denotes the Hilbert-Schmidt 
norm and || • ||oo the operator norm of operators acting in H. With a minor 
abuse of notation, (•, •) denotes both the inner product of H and the Hilbert- 
Schmidt inner product. We will also use the sign ® to denote the tensor product. 
For instance, for u, u G H, u 0 u is a linear operator in HI defined as follows: 
{u 0 v)x = u{v, x), X S H. 

In what follows, we will frequently prove exponential bounds for certain ran¬ 
dom variables, say, of the following type: for some constant C > 0 and for 
all f > 1, with probability at least 1 — e“*, ^ < Cy/t. Often, it will be proved 
instead that the inequality holds with probability, say, 1 — 2e“*. In such cases, it 
is easy to rewrite the probability bound in the initial form by changing the value 
of the constant C. For instance, replacing thy t-\- log 2 allows one to claim that 
with probability 1 — e“*, ^ < Cy/t + log 2 < (7(1 -f log 2)^/^-\/t that holds for all 
t > 1. In such cases, it will be said without further explanation that probability 
bound 1 — can be replaced by 1 — by adjusting the constants. 


2. Preliminaries 

In this section, we discuss recent bounds on the operator norm ||S„ — E||oo 
obtained in [ 11 ] and several well known results of perturbation theory used 
throughout the paper (see also [ 10 ]). 

2.1. Bounds on the operator norm ||I1„ — S||oo. 

In [11], it was proved that, in the Gaussian case, moment bounds and concentra¬ 
tion inequalities for the operator norm jjE — Ejjoo are completely characterized 
by the operator norm j]Ejjoo and the effective rank r(E). More precisely, the 
following theorems hold. 

Theorem 1. Let X,Xi,... ,Xn be i.i.d. centered Gaussian random vectors in 
HI with covariance E = E(7f 0 X). Then, for all p > 1, 

r(E) 

y n ^ n 


Ei/J>j[E-EjjPo Xp llEjj 


max 


( 2 . 1 ) 
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Theorem 2. Let X,Xi,... ,Xn he i.i.d. centered Gaussian random vectors in 
HI with covariance S = E(X (8)X). Then, there exist a constant C > 0 such that 
for all t > 1 with probability at least 1 — e“*, 



( 2 . 2 ) 


As a consequence of this bound and (2.1), with some constant C > 0 and with 
the same probability 



(2.3) 


2.2. Perturbation theory 

Several simple and well known facts on perturbations of linear operators (see 
Kato [7]) will be stated in a form suitable for our purposes. The proofs of some 
of these facts that seem not to be readily available in the literature were given 
in [10] (see also Koltchinskii [9] and Kneip and Utikal [ 8 ] for some bounds in 
the same direction). 

Let E : HI I—>■ H be a compact symmetric operator (in our case, the covariance 
operator of a random vector X in H) with the spectrum cr(E). The following 
spectral representation is well known to hold with the series converging in the 
operator norm: E = X)r>i TrPr, where ptr denotes distinct non-zero eigenvalues 
of E arranged in decreasing order and Pr the corresponding spectral projectors. 
Denote by Ci = cri(E) the eigenvalues of E arranged in nonincreasing order and 
repeated with their respective multiplicities. Let = {i : (Ti(E) = /i^.} and let 
mr := card(Aj.) denote the multiplicity of pr- Define pr := ffr(E) := pr — Pr+i > 
0,r > 1. Let (jr ■= ffr(E) := min(gr-i, <7r) for r > 2 and gi := gi. The quantity 
Pr will be called the r-th spectral gap, or the spectral gap of eigenvalue pr- 

Let now E := E -|- be another compact symmetric operator in HI with 
spectrum cr(E) and eigenvalues di = CTi(E),i > 1 (arranged in nonincreasing 
order and repeated with their multiplicities), where if is a perturbation of E. 
By Lidskii’s inequality. 


sup|crj(E) -CTj(E)| < sup|crj(L;)| = Plloo 


J>1 


J>1 


Thus, for all r > 1, 



inf \aj - pr\ > Pr- sup \aj - (Jj \ > pr - ||LI||oo 


sup \d-j - Pr\ = sup \aj - (7j \ < ||if||oo- 
jeAr jeAr 


and 
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Assuming that the perturbation E is small in the sense that 

l|i?l|oo<f, 

it is easy to conclude that all the eigenvalues aj,j € are covered by an 
interval 

oo 1 f^r + C {fj-r 9rl^^ 9-r + 9rl‘^') 

and the rest of the eigenvalues of E are outside of the interval 

{,9r ||-^||oo); Mr “t“ (f/r ||-^|loo)^ A [/Xr 9rl^^ Mr “t” Mr/^]- 

Moreover, under the assumption ||£l||oo < \ mini<s<rMs ='■ ^r, the set {crj(S) : 
j € Us=i of ffio largest eigenvalues of S consists of r “clusters”, the di¬ 
ameter of each cluster being strictly smaller than 25r and the distance between 
any two clusters being larger than 25r- Thus, it is possible to identify clusters 
of eigenvalues of E corresponding to each of the r largest distinct eigenvalues 
Ms,s = l,...,r of E. Let Pr be the orthogonal projector on the direct sum 
of eigenspaces of E corresponding to the eigenvalues aj^j G A^. (to the r-th 
cluster of eigenvalues of E). The following “partial resolvent” operator will be 
frequently used throughout the paper: Cr := 'Yhs^r jArjr^s- 

We will need a couple of lemmas proved in [10] (see Lemmas 1 and 4 therein): 
Lemma 1. The following bound holds: 



\\Pr-Pr\\oo<'iEJ^. 

9r 

(2.4) 

Moreover, 

Pr-Pr = Lr{E) + Sr{E), 

(2.5) 

where 

Lr{E) := CrEPr + PrECr 

(2.6) 

and 

||5.(i5)||oo < 

(2.7) 

Lemma 2. Let 7 G (0,1) and suppose that 



- l + -f 2 

(2.8) 

Suppose also that 

||£^||oo < (1 + 7)^ and IjLl'Iloo < (1 + 7 )<^- 

(2.9) 

Then, there exists 

a constant Cj > 0 such that 



\\Sr{E)-Sr{E')\\^<C^f^\\E-E'\\^. 

(2.10) 


9r 
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3. Bounds on the risk of empirical spectral projectors 


Let Pr be the orthogonal projector on the direct sum of eigenspaces of E corre¬ 
sponding to the eigenvalues {o-j(E), j € A^} (in other words, to the r-th cluster 
of eigenvalues of E, see Section 2.2). 

We will state simple bounds for the bias EL), — and the “variance” E||Pr ~ 
EPrIli that immediately imply a representation of the risk E||Pj. — Pr||i. 


Denote 



2l,(E) := 2tr(P,EP,)tr(C,EC',). 

(3.1) 

It is easy to see that 

71,(E) <2^^||E||oor(E) 

(3.2) 


9r 


and 


(3.3) 

which implies that 

7l,(E) X r(E) 

(3.4) 


(assuming that ||E||oo and rrir are bounded away both from 0 and from oo, Qj. 
is bounded away from 0 and r(E) -A oo). 


Theorem 3. The following bounds hold: 


1 . 


and 


IIEP, 

IIEP, - 



2. In addition, 


where 


\Pn\ < 


E||P,-EP,||^ = ^^^+P, 



(3.5) 

(3.6) 

(3.7) 

(3.8) 


//E = E*^"\ the sequences ||E*^"^||oo andnir = m, 
from 0 and from oo, Qr = is bounded away fro 


>m u, ana 


r(E) = o(n), 


then the following representation holds: 


E||P,-P,||2 = 


A,(E) 


o 


r(E) 


3/2 


= ( 1 + 0 ( 1 )) 


71, (E) 


(3.9) 
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Remark 1. In the case of spiked covariance model (1.1) for all r = I,... ,m, 


y4^(S) = 2 



- to)(s^ + cr^) 


E 


is^^+a^)is)+a^)\ 
is^r - ) 


Assuming that to, s^,..., s)^,a'^ are fixed, p —>• oo and p = o(n) as n ^ oo, it is 
easy to check that (3.9) implies bound (1.2) obtained in [1]. 


PROOF. Recall the following relationship (see Lemma 1) 

P^-P^ = Lr{E) + Sr{E), (3.10) 

where E := T, — E, Lr{E) := CrEPr + PrECr and Sr{E) := P^ — Pr — Lr{E). 
Clearly, CrPr = PrCr = 0 (due to the orthogonality of Pr and Ps,s p r). 
Also, PrX and CrX are independent random variables (since, by the same 
orthogonality property, they are uncorrelated and X is Gaussian). 

To prove Claim 1, note that, since ELr{E) = 0, we have ER. — Pr = 'ESr{E). 
Therefore, by bound (2.7) of Lemma 1, we get 

mPr - P.lloo < E||5,(ii;)||oo < (3.11) 

9r 

Bound (3.5) now follows from Theorem 1. Bound (3.6) is also obvious since 
Pr,Pr are operators of rank to^, Lr{E) is of rank at most 2mr and Sr{E) = 

Pr — Pr — Lr{E) Is of rank at most Thus, ||S',.(iil )||2 < y/mr\\Sr{E)\\oo, and 
the result follows from the previous bounds. 

To prove Claim 2, note that Pr — = Lr{E) + Sr{E) — ESr{E). Therefore, 

IIP, - EP,||2 = ||L,(P)||2 + ||5,(P) - ESr{E)\\l + 2(l,(P), 5,(P) - EP,(P)). 

(3.12) 

The following representations are obvious: 

n n 

CrEPr = n-^ CrXj (g) PrXj, PrECr = ^ PrXj (g) CrXj. (3.13) 

1=1 1=1 

Note that, by (3.13), due to orthogonality of CrEPr, PrECr and due to inde¬ 
pendence of PrX, CrX, 

E||P,(P)||2 = EllaPPr + PrECrWl = E^HaPPrUa + llPrPalla) = 2E||aPPr-||2 

2E\\PrX®CrX\\l _ 2E\\PrXf \\CrXf 
n n 

2E\\PrX\\‘^E\\CrXf _ 2tT{PrEPr)tT{Cr^Cr) _ Ar{E) 


= 2E 


V PrXf 0 CrXf 




n 


n 


n 


(3.14) 
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Next, note that E||5'r(£') — ESr{E )\\2 < E||5'r(£')||2- Recall that Sr{E) is of 
rank < Anir and ||*S'r(-E)|l 2 < 4mr||S'r(i5)||^. Quite similarly to (3.11), one can 
prove that E||5'r(ill)||^ < ^E||iil||^. Therefore, by Theorem 1, we get 


E||5,(£;)-E5,(i?)||l 



(3.15) 


As a consequence of (3.14) and (3.15), it easily follows that 


E 


Lr{E),Sr{E)-ESr{E)) < E^^^Lr{E)\\lE^/^Sr{E) 


< 


< 



E5'..(£;)||^ (3.16) 


(3.7) and (3.8) now follow from (3.12), (3.14), (3.15) and (3.16). 

Claim 3 is an easy consequence of the first two claims due to the “bias- 
variance decomposition” E||Pj. — Tr-lli = — Prill + A- — EAUI (see also 

(3.4)). □ 


4. Concentration Inequalities 


The main goal of this section is to derive a concentration bound for the squared 
Hilbert-Schmidt error ||Pr — Prill around its expectation. Denote 


Br(E) := 2V2||P^EPr||2||aEa||2. 

Theorem 4. Suppose that, for some 7 G (0,1), 

E||S-E||oo < 

Moreover, let t > 1 and suppose that 


m, 


< 


1 , Mk/l<i. 

5r V n 


(4.1) 


(4.2) 


(4.3) 


Then, for some constant > 0 with probability at least 1 — e * 

2 

— 

9r n 


|Pr- Prill -E||P,-P,|| 


< P 7 


/I 

n ^ q'i n ^ gf n \ n 


(4.4) 


Note that the first term Vi in the right hand side of (4.4) is dominant 
if BriE) —>■ oo and g next section, it will be shown that 
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under the same assumptions the random variable a is close 

Vari/2(||p^_p^y2) 

in distribution to the standard normal and, in addition, Var^^^djP^ — Pr-Hi) = 

( 1 + 0 ( 1 ))^. 

The main ingredient in the proofs of these results is a concentration bounds 
for the random variables ||Pr — ^rlli ~ ll-^r-(P )||2 given below. 


Theorem 5. Suppose that, for some 7 G (0,1), condition (4-2) holds. 

Then, there exists a constant L.y > 0 such that for all t > 1 the following 
bound holds with probability at least 1 — e“* : 


IIP, - p.ii^ - \\Lrm\i - e(iia - p.ii^ - iii.(i^)ii^) 
||S||L /r(S) , ,t, ,/'tV\ ft 


(4.5) 


< Lrym, 


h 5 >v-v(- 


PROOF. It easily follows from Theorem 1 that under assumption (4.2) 

which implies that r(S) < n. Theorem 2 implies that for some constant C' > 0 
and for alH > 1 with probability at least 1 — e“* 

||S - SlU < E||E - Elloo + C'||E||oo(y^V i)- 
We will first assume that 

Cm\ooJ^ < ^ (4.6) 

V n 4 

with a sufficiently large constant C > 1 (the proof of the concentration bound 
in the opposite case will be much easier). This assumption easily implies that 
t < n and, if C > C", 


Denote 

S„(t) := E||E-S||oo+C||S||oo\/-. 

V n 

Then P{||S-E||oo>5n(i)}<e-‘. 

As before, denote E = S — E. The main part of the proof is the derivation 
of a concentration inequality for the function 

g(X,,. ..,XJ= (^||P, - PrWl - \\LriE)\\iy(^^^'^ , 
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where, for some 7 £ (0,1), (p is a Lipschitz function on ]R_|_ with constant i, 
0 < < 1, p{s) = l,s < 1, ip{s) = 0,s > 1 + 7, and i5 > 0 is such that 

Ill'll 00 < ^ with a high probability. This inequality will be then used with S = 
Sn(t). Together with Theorem 2, it will imply bound (4.5) under the assumption 
(4.6). 

Our main tool is the following concentration inequality that easily follows 
from Gaussian isoperimetric inequality. 

Lemma 3. Let Xi,... ,Xn he i.i.d. centered Gaussian random variables in H 
with covariance operator E. Let f : H” 1—^ R 6e a function satisfying the following 
Lipschitz condition with some L > 0 : 


f{xi,...,Xn)-f{x[,...,x'^) 


< L 


i=i 


1/2 


Xi, 




Suppose that, for a real number M, 

P{/(Xi,..., X„) > M} > 1/4 and P{/(Xi,..., X„) < M} > 1/4. 
Then, there exists a numerical constant D > 0 such that for all t > 1, 
p{|/(Xi,...,X„)-M| >i^L||E|lV2Vt} <e-‘. 


We have to check now that the function g{Xi ,..., X„) satisfies the Lipschitz 
condition (with a minor abuse of notation we view Xi,..., Xn here as non- 
random vectors in H rather than random variables). 

Lemma 4. Suppose that, for some 7 £ (0,1/2), 


<5 < 


1 - 27 gr- 
1 + 27 2 


|g("Vi, . . . , Xn) g(^l J • ■ • ; ^n}\ — Lj^rUr ,— 

g/ 



(4.7) 

..,w 



\ 1/2 

A'f 

) ■ 


(4.8) 


PROOF. Observe that 

\\Pr - p,||2 _ \\Lrm\l = \\Lr{E) + Sr{E)\\l - \\Lr{E)\\l = 

2(^Lr{E),SAE)) + \\Sr{E)\\l=-.g{E). 

Also, note that Lr{E) is an operator of rank at most 2mr and Sr{E) = Pr — Pr — 
Lr{E) has rank at most Amr (under the assumption that || A||oo < gr/2 implying 
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that Pr is of rank m^). This allows us to bound the Hilbert-Schmidt norms of 
such operators in terms of their operator norms: \\A \\2 < rank(yl)||yl||^. Thus, 
we get 


|g(Xi,... ,X„)| < AV2mr (||T.(i5)||oo||5.(i?)||oo + ll5r(£^)|lL) 


f\\E\\c 


V 


Since ip 




= 0 if ||ii^||oo > (1 + j)S, claims (2.6), (2.7) of Lemma 1 imply 


that, under assumption (4.7) 


1 9 {^ 1 7 • ■ • ; ) I ^ C^TTIy- 1 _ 

\9r 


(4.9) 


for some constant c-y > 0 depending only 7 . 

We will denote E' := n~^ ® E' := E' — E. Using now (2.6), 

(2.7), (4.9) and the fact that p is bounded by 1 and Lipschitz with constant i, 

which implies that the function t 1 —is Lipschitz with constant we 
easily get that, under the assumptions 

PI|oo<(l+7)^, ||i?'||oo<(l+7)^, (4.10) 

the following inequality holds: 


g{E)p 


Pile 


-g{E')p 


\E% 


(4.11) 


< \~g{E)-~g{E')\ + ^^r^\\E-E^\\^ 

7 9r 

< 2\{Ly{E - E'), Sr{E))\ + 2\{Lr{E'), Sr{E) - Sr{E'))\ 

+ \{Sr{E) - SY{E'),Sr{E)+Sr{E'))\ 


Cj 

7 9r 


\\E-E'\U 


Using the Lipschitz bound of Lemma 2 and (2.6), (2.7) of Lemma 1, 
we easily get that 


g(Xi,...,X„)-g(X(,...,0 


where > 0 depends only on 7 . 

A similar bound holds in the case when 


<c'^mr^\\E-E'\\^, (4.12) 

9r 


|PI|oo< ( 1 + 7 )^, ||£;'||oo>(1 + 7)<5 

(when both norms are larger than (1 + 7 )^, the function p is equal to zero and 
the bound is trivial). Indeed, first consider the case when \\E — E'\\oc, > "fS. 
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Then, in view of (4.9), we have 


g{E)(f 


\\E\U 


-g{E')Lp 


\\E'\U 


g{E)ip 


l|£^llc 


§3 ^ g2 

< CjUlr^ < —rjir^WE — E'\ 
9r 7 9r 


On the other hand, if \\E — -E'Hoo < 7^, we have that H-E'lloo < (1 + 
and, taking into account assumption (4.7), we can repeat the argument in the 
case (4.10) ending up with the same bound as (4.12) with a positive constant 
(possibly different from cl^, but still depending only on 7) in the right hand side. 

The following bound (see Lemma 5 in [10]) provides a control of \\E — E'\\oc, ■ 




4||E||i/V4v^ A 


y/n 


1=1 


1/2 




1=1 


(4.13) 

Now substitute the last bound in the right hand side of (4.12) and observe that, 
in view of (4.9), the left hand side of (4.12) can be also upper bounded by 
2c.ymr^- Therefore, we get that with some constant > 0, 


g(Xi,...,xA-g(^l...,x;) 


(4.14) 


< 4c 

^ 9r 

r 

< L^TTlr^ 

9r 


- y 


/\2e 


l|E|l 


(E ''V(^ E ii^i - 




Using an elementary inequality a Ah < y/ab^ a,b> 0, we get 

n /T / " \ 1/2 

1=1 u=i '' 

This allows us to drop the last term in the maximum in the right hand side of 
(4.14) (since a similar expression is a part of the first term). This yields bound 
(4.8). 

□ 

Getting back to the proof of Theorem 5, it will be convenient to prove first a 
version of its concentration bound with a median instead of the mean. Denote 
by Med(r7) a median of a random variable g and define M := Med^jjA- — 

Alii ~ ll^r(^^)||2) ■ Let 6 := Sn(t) and suppose that t > log(4) (by adjusting the 
constants, one can replace this condition by t > 1 as it is done in the statement 
of the theorem). Under conditions (4.2) and (4.6), Sn(t) < (l — 2) ^ “ 1+27' ^ 


«SII On 
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for some 7' € (0,1/2). Thus, the function g{Xi,... ,Xn) satisfies the Lipschitz 
condition (4.8) with some constant £>/ = Dy. Also, we have P{||i?||oo > < 

< 1/4. Note that on the event {||A'||oo < ^}, g^Xi,... , A„) = \\Pr — Prili ~ 
||Pr(P)||2- Therefore, 

P{5(Xi,..., A„) > M} > P{5(Xi,..., A„) > M, ||P|U < <5} > 
p{|| A- - PrWl - \\Lr{E)\\l > A/} - P{||P||oo > 5} > 1/4. 

Quite similarly, P{5(Ai,..., A„) < M} > 1/4. It follows from Lemma 3 that 
with probability at least 1 — 


g(Ai,...,A„)-M 


< A/m, 


Snjty 

9 I 


l|S|l//^(||E|l^^+V^)y| 


with some constant L/ > 0. Using the bound 


that easily follows from the definition of <5„(t) and the bound of Theorem 1, we 
get that with some > 0 and with the same probability 


5(Ai,...,X„)-M 


< LjUi., 


/I 

9r \ ^ ^ n)\ n' 


Since P{||i?||oo > S} < e * and g{Xi, ..., A„) = \\Pr — Pr-Hi ~ II Ar(A ')||2 when 
ll-Ulloo < we can conclude that with probability at least 1 — 2e“* 


IP- — Pr 


||P,(P)||2-M 


< L. 


.nrir 


l|S|| = 



< LyTTlr 


l|S||^ 



Adjusting the value of the constant one can replace the probability bound 
1 — 2e“* by 1 — e~*. 

We will now prove a similar bound in the case when condition (4.6) does not 
hold. Then, 


9r 



(4.15) 


It follows from bound (2.4) and the definition of Lr{E) that, for some constant 
c > 0, 

l|A||L 


5r 


||P-P.||^-||A.(P)||^ 


< cm. 
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We can now use the bounds of theorems 1 and 2 to show that under condition 
(4.2) for some C > 0 with probability at least 1 — 



P. 


\\ME)\\l 


< Crrij. 




In view of condition (4.15), we get from the last bound that with some Ll), > 0 
with probability at least 1 — 



\\Lr{E)\\l 


< L'^nir 




This easily implies the following bound on the median M : 


M < L'^TTlr 


l|S||3 





Therefore, for some > 0 and for alH > 1, with probability at least 1 — e 


\Pr-Pr\\l-\\Lrm\l-M 


< L^nir 




g? 


V^V 


, (4.16) 


and the last bound was proved in both cases (4.6) and (4.15). 

It remains to integrate out the tails of exponential bound (4.16) to get the 
inequality 


mPr-Pr\\l-\\Lr{E)\\l)-M 
||E||L /r(S) , ,1\ 11 


< E 


\\Pr-Pr\\l-\\Lrm\l-M 


< 


L^m. 


9r 


r(S) \ / 1 

n ^ n 


with some > 0, which, along with (4.16), implies concentration inequality 
(4.5). 

□ 

We now turn to the proof of Theorem 4. 


PROOF. In view of Theorem 5, it is sufficient to obtain a concentration bound for 
||Lr(i?)||2 — E||Lr(£')||2- This could be done by rewriting ||Lr(^^)||2 in terms of 
U-statistics and using the corresponding exponential bounds. However, we will 
follow a different (more elementary) path that directly utilizes the Gaussiness of 
random variables {Xj}. The key ingredient is the following simple representation 

lemma. In what follows, ^ = rj means that random variables ^ and 77 have the 
same distribution. 

Lemma 5. The following representation holds: 

n\\Lr{E)\\ll2 7feliaxW||2, 

fceAr 


(4.17) 
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where 7 ^ are the eigenvalues of the random matrix := ^ ^r^i ^ Pr^i 
andX^^\ k&Ar are i.i.d. copies of X independent ofT^. 


PROOF. Note that n||Lr(£’)||| = n\\PrECr + C'rE’Pr-Hi- Since the operators 
PrECr and CrEPr are orthogonal with respect to the Hilbert-Schmidt inner 
product and 

WP^ECrWl = tviPrECrCrEPr) = tv{CrEPrPrECr) = \\CrEPr\\l, 

we have 


\\PrECr + CrEPrWl = \\PrECr\\l + \\CrEPr\\l = 2\\PrECr\\i 

Also, note that PrECr = ^ PrXj ® CrXj. Therefore, 

1 


n\\LriE )\\^2 = 2n\\PrECr\\i = 2 


V PrXj 0 CrX, 


(4.18) 


Define the following mapping 

T(lti ®U2®U‘i® ^4 ) = (mi ®U3®U2® U4),Ul,U2,U3,Ui G H. 

It can be extended in a unique way by linearity and continuity to a bounded 
linear operator T :]Hl0]Hl0]Hl0]HIi— 

Recall that PrXj,j = l,...,n and CrXj,j = are centered Gaus¬ 

sian random variables and they are uncorrelated (see the proof of Theorem 
3). Therefore, they are also independent. Conditionally on PrXj,j = 1,... ,n, 
the distribution of random operator U := 'Yfj=i PrXj ® CrXj is centered 
Gaussian with covariance 

n 

E{U ® U\PrXj,j = 1, . . . , n) = n~^ ''fj^E(^PrXj ® CrXj ® PrXj ® CrXj\PrXj,j = 1, ..., n 

i=i 

= TiVr ® E{CrX ® CrX)) = T(r^ 0 {Cr^Cr))- 

Note that T^ can be viewed as a symmetric operator acting in the eigenspace of 
eigenvalue fir, and it is nonnegatively definite. Thus, it has spectral represen¬ 
tation Tr = X^fcGA lk4>k ® 4>k, where 7fe > 0 are its eigenvalues and (fk are its 
orthonormal eigenvectors (that belong to the eigenspace of p,r). It follows that 

E{U 0 U\PrXj,j = I, . . . ,n) = r I ^ 'fkiff'k ® (t)k® E{CrX ® CrX)'j 

\keAr 

Let X^^\k G Aj. be independent copies of X (also independent of Xi ,..., A„). 

Denote V := \/^4>k ® CrX^^\ It is now easy to check that 

E(D0D|P,A„j = l,...,n)=T( 7] jk(fk<^f)k^E{CrX®CrX)'jy 

\kGAr / 
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implying that conditional distributions of U and V given P^Xj^j = 1,..., n are 
the same. As a consequence, the distribution of n||ir(i?)||| = 2||C7||2 coincides 
with the distribution of random variable 


211^11^ 


2 Y. T'fe 

fceAr 


(j)k ® 


2 ^ 7fc||axW||2. 

fceA^ 


(4.19) 


Note that 


liaAWf = ^ ^ 

s^rj^As 


fts 

(/is - /ir)2 


VkJ ; 


□ 


where rjkj := /is G As,k G Ar,s ^ r are i.i.d. standard normal 

random variables, {9j : j G A^} being an orthonormal basis of the eigenspace 
corresponding to /is, s > 1. In view of representation (4.19), we get 


n\\Lr{E)\\l 


= 2 EEE 

fceAr s^r jeAs 


7feMs 2 


and, since 7fe,fc G A,, and rikj,j G As,fe G A^ are independent. 


E7feMs 


,EiiL.(i5)ii^=2^ 

kGAr s^r jeAs ^ ' s^r ^ ' 


= 2E 


Etr(rr)?7is/is 


_ tT{PrEPr)msfJ-s _ r, mr/irUis/is 

^ (/is - /i.)2 ^ 


s^r 

Therefore. 


s^r 




= 2tr(P^EP^)tr(aEa) = ^r(^). 


2^2 f^rf^s fk / 2 


\Lr{E)\\l-n\Lr{E)\\l^ - 




(T/f , - 1) (4.20) 


^EE' 




^J,rmsfJ.s I 2A - l\ 


In order to control the right hand side in the above display, the following 
elementary lemma will be used. 

Lemma 6. Let {^fc} be i.i.d. standard normal random variables. There exists a 
numerical constant c > 0 such that for all t > 0 




> t 



}]. 

supfc |Afe| jy 


PROOF. By a simple computation, 


Eexp 




1 

-\/e2“'^''(I — 2uXk) 
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for all w > 0 such that 2usupfc Afe < 1. Since e“(l — a:) > (1 + a;)(l — x) = 1 — a:^ 
for a; G (0,1), we easily get 

, ^ ^ < ^1 + 2a;2 < G (0,2-^/^). 

This implies that for all at > 0 satisfying the condition 2atsup^. Xk < 2“^/^, the 
following bound holds: 


Eexp 




< exp < 4it^ 


The bound on P (J2k ~ 1) ^ now follows by a standard application of 

Markov’s inequality and optimizing the resulting bound with respect to u. 
Similarly, 

, '1 pU\k 

Since ~ ~ ^ log(l + 2x)} < x > 0, we get 

Eexp|u^Afc(l-^^)| < expjw^^ A^|,u > 0, 


implying the bound on the lower tail. □ 

Applying the bound of the lemma to the first term in the right hand side of 
relationship (4.20) conditionally on 7^,/c G A^., we get that with probability at 
least 1 — e“‘ 


fceAr s/r jGAs 


fJ-rfJ-s "fk , 2 

(Ms - Mr) 


1 ) 


< 



Ur'rrislJ-'i SfcgA,. 7fc \ / MrA^s snPfcgA,.7fc t 

(Ms-Mr-)'‘ Mr / n ^ s/r (Ms - Mr)^ Mr n' 


Since sup^g^^ yt = ||r^||oo, EfceA, Ik < ™r||rr||L and 


i?2(E)=8^ 

s^r 


rUrUrl^slJ^^ 
(Ms Mr)^ 


the last bound can be rewritten as 


^EEE 

fegAr s^r jgAg 


l^rl^s Ik , 2 1 \ 

7-^- [Vk,j ~ 1) 

(Ms -Mr) Mr 


< -B.(E) 


||rr||oo Vt 

yLr n 


V 


l|S||Ll|rr||oo t^ 

Mr fir n' 


(4.21) 
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As to the second term in the right hand of (4.20), the following bound is straight¬ 
forward: 


2 

n 


EE 

k£Ars^r 





2 ^ > TTlj^^^TTlgfJyg ll^r -^r^-^rlloo 
n ^ {fls- ^JirY fJ-r 


(4.22) 


Ar{'S) ||r^ — Pj-SP^Iloo 

n fj,r 


Theorems 1 and 2 easily imply that for alH > 1 with probability at least 1 — e 


Ijr,. /^r.fr||oo 



PrXj^PrXj-E{PrX^PrX) 

t=l 


oo 


Mr- 



Under additional assumptions nir t <n, this bound could be simplified as 


llTj. MrI loo 



(4.23) 


and it implies that < 1. 

Thus, representation (4.20) and bounds (4.21), (4.22) imply that with prob¬ 
ability at least 1 — e“‘ 


ll^r(i?)||^ 


nLr{E)\\i 


<i?,(E)^\/^P^ 




(4.24) 

To complete the proof, it is enough to combine bound (4.24) with concentra¬ 
tion inequality of Theorem 5, to use bound (3.2) to control Ar(S) and to take 
into account conditions (4.3) to simplify the resulting bound. 


□ 


5. Normal approximation of squared Hilbert—Schmidt norm errors 
of empirical spectral projectors 

The main result of this section is the following theorem: 

Theorem 6. Suppose that, for some eonstants Ci,C2 > 0, < Ci and ||S||oo < 

C25r- Suppose also condition (4-2) holds with some 7 € (0,1). Then, the following 
bounds hold with some constant C > 0 depending only on 7, Ci, C2 : 



(5.1) 
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and 


sup 

xGM. 


< c 



-P,||l-E||A--Pr|| 

Ys.?/W\Pr-Pr\\l) 


2 

2 



$(x) 




(5.2) 


where $(x) denotes the distribution function of standard normal random vari¬ 
able. 

This result essentially means that as soon as —>■ oo and ^ ® 

as n —oo (for E = the sequence of random variables ^^2 (|^^^ 

is asymptotically standard normal. 

We will first establish the following fact that would allow us to replace ''f ’ 
in bound (5.1) by a normalizing factor Var^/^(||p. — P-Hl) in bound (5.2). 

Theorem 7. Suppose condition (^.2) holds for some 7 £ (0,1). Then the fol¬ 
lowing bound holds with some constant > 0 : 


Bri^) 


Vari/2(||P,-P,||2)-l 


< a 


||E||g, r(E) , m, + l 
^ Pr Br{T,)y/n n 


(5.3) 


Bound (5.3) shows that, under the assumptions mr ^ 
mr = o{n), we have 


Vari/2(||P, - P.ll^) = (1 + o(l))i^. 

n 

Remark 2. Note that in the case of spiked covariance model (1-1), for r = 

l,...,m, 




l^j^m,j^r 




(5.4) 

which, under the assumption that the parameters m,s\,..., s^, cr^ are fixed, but 
P = Pri—too as n ^ 00 yields that 

Pr.(Ej = (1 + o(l))- 2 - as n —>• 00. (5.5) 


Note also that r(E) ^ Thus, the condition p = o{n) implies ^ 


0 as n —>■ 00. Therefore, Theorem 7 yields that 

vaC/^dlA - PAH) = (i + oA))NM±f>l:/E. 
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Moreover, the bounds on the accuracy of normal approximation of Theorem 6 
are of the order 


sup 


Pr PrW^ ^11 A- Pr 


Vari/2(||P^_P^||2) 


< X > — $(x) 


< C 


± + Jl\ogf^V2 

y/p \ n \p 


logn 

y/n 


so, the asymptotic normality of \\Pr — PrW^ holds if p = Pn ^ oo and p = o(n) 
as n ^ oo. 


PROOF. In view of relationships n\\Lr{E )\\2 = 2||F||| and (4.19) (see the proof 
of Lemma 5), we have 


Var(||L,(L;)||2) = AvardlFH^) = Ayar ( ^ 7fc||axWf 


KkeAr 


= 4 ® 


Var ^ lk\\CrX^%‘ 

\keAr 


PrXi, . . . , PrXn 


rVar E 


.feGAr 


PrXi, . . . , PrXn 


(5.6) 


Recall that 7^, k £ depend only PrXi ,..., PrX^ and that k £ are 

independent of Xi,..., X„. Thus, we get 


E 


Var V 7fc||MW|p 


PrXi, . . . , PrXn 


= E 


^ 7fc^Var(||avWf) 


= E[||r,||2 ]Var(||axf). 


^ E[72]Var(||avWf) 

feGAr 

(5.7) 


By an easy computation, 


E||r,||2 = E 


n ^ PrXj (g) PrXj 
i=i 


= rUrpl ( 1 + 


rUr + 1 


and, for i.i.d. standard normal random variables {r]j} 


Var(||avf)=Var ^ ^ 




vs^rj^As 
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Therefore, 
E 


Var ^ 

VfceA,. 


PrXi, . . . , PrX„ 


f ^ ^ mr + l 


Similarly, we have 


Var E 


and 


^ 7 feliaxW|p 


.fceAr 


PrXi, . . . , PrX„ 


= Var(tr(r,))(E[||axf])' 


(5.8) 

= Var(^ 7feE[||M«f]^ 

(5.9) 


\keAr 


Var(tr(r,)) = (E [HMf])" = -^yl2(E), 


implying that 


Var E 


E 7feliax«|p 


.feeAr 


PrXi, . . . , PrX„ 


2mrn 


It follows from (5.6), (5.8) and (5.10) that 


Var(||T,(i?)||l) = 


B^iE) mr + l\ 2A2(E) 


1 + 


+ 


(5.10) 


(5.11) 


Denote now 


^■■=\\Pr-Pr\\l-nPr-Pr\\l 7/ := || T. (i?) || ^ - E|| L, (i?) || ^ (5.12) 

and (t| = E,^^, Combining concentration bound of Theorem 5 with 

the identity E|^ — 77P = P{|^ — 77^ > t}dt, we obtain that 

vW^<C^777,^P^®^, (5.13) 

Or n y/n 

for some C'.y > 0 depending only on 7. 

To complete the proof, observe that identity (5.11) implies that 


S.(E) 


Var‘''"(||L,(E)||^) - 1 


<Ji+”jL±l+ _i 


n m, 


■BUT,), 




^/2Ar(T,) ^ nir + 1 ^ y/^ArCE) 


,Jrnf^Br{E),Jn n ,Jrnf.Br{E),Jn^ 

then bound 7lr(E) using (3.2) and combine the resulting bound with (5.13). 
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We now return to the proof of Theorem 6. 


PROOF. Under notations (5.12), we will upper bound sup,j,gR P | bJ(s) ^ ~ 

Theorem 7 will allow us to rewrite the normalizing factor in terms of the vari¬ 
ance. First recall that by Theorem 5, with probability at least 1 — e“‘, 


1 ^ - 77I < LjTti, 
Also, by (4.20), 

77 = 

2 


n 


2 

n 


EEE 


9t 

'v n V 

E 

fXrl-ts 

(/Ts - 


Hr9s { 


ivlj - 1) 


Tfc 


- 1 ivli - 1) 




MrWsMs / 7fc 




- 1 


=: Cr + C2 + Ca- 


(5.14) 


(5.15) 


Similarly to bound (4.21), we get that with probability at least 1 — e 


iC 2 i<i?.(s)E 


fij-Pj- ||oo ^/t 

fir n 


V 


9r 9r n' 


(5.16) 


Assume that 1 < t < n and rur < n. It follows from (5.16), (4.22), (4.23) and 
also from bound (3.2) on Ar(S) that 



(5.17) 



Under the assumptions of the theorem rrir ^ 1, ||S||oo ^ 5r, it is easy to get 
from (5.14), (5.15) and (5.17) that 


where 


Bri^) 


^ = r + C, 


T := 


2 

Br{^) 


EEE 

k^Ar s^r j^As 


l^rf^s (^2 


1 ) 


(5.18) 


(5.19) 
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and the remainder satisfies the following bound with probability at least 1 — 
^-t . 




^3/2 


Br{T,)^/n' 


(5.20) 


We now use Berry-Esseen Theorem and a simple limiting argument that al¬ 
lows one to apply it to a (possibly) infinite sum of independent random variables 
(5.19) to get the following bound: 


sup |P {r < a;} — $(x)| < 

cceM. 


E 




mrP^nXsp.'i 


1 Es/r {fj.s-fj.rr 


Z/2 


< 


II^IIL 1 

9l 


(5.21) 


where we also used the fact that = 8 ' 

It follows from (5.18), (5.20) and (5.21) that with some constants c',c'' > 0, 
for all X G M, 


P 


BriW 



< P 


r < X -f 



(5.22) 


< $ ^x -I- c' 

< d>(x) -I- c 



+ B,(E) 


5.(S)’ 


where we used the fact that <i> is a Lipschitz function with constant less than 
one. Quite similarly, 


Br{^)^ 


- (v^ V b,(L)U'^V B,(E)V7i) 


B.(E) 


> $(x) 


_,(_t \/ r(S) 




fV^V- 




\\/n * Brijy^jn * Br{'B)^Jn) 
It follows from (5.22) and (5.23) that 
n 


Bri^)' 


sup 

xGM. 

< c' 


B.(E) 


^ < X > — ^(x) 


t 


\/_r^ \ c" 

■ V B,(E)v^^ V s,(e) 


The last bound will be used with 


— e 
(5.23) 


(5.24) 


t = logBr(S)/\log Y2^/\logn, (5.25) 
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which implies that 


< 


l'(S) /T\ / \ / 


— V V . w w 

y/n V Br(S)Vn Br-(S)v^ V 


(5.26) 


and we also have 


^3/2 ^ logn Y^logBr(S) 


y/n Br{T,) 

Without loss of generality we can assume that Br{T,) is bounded away from 0 
by a numerical constant so that ^ (otherwise, the bounds of the 


s.(s) ^ 

j3/2 


theorem trivially hold). This implies that a (s)yH — (5-24) implies 


sup 

xGK. 


< C 


^ < X > — <I)(x) 


Bri^) 

1 , r(S) 


(5.27) 


Br{Ti) Br(S)y/n 


/log 


Br(X)y/n 

r(S) 


V 2 


logn 

y/n 


which proves bound (5.1). 

To complete the proof of bound (5.2), it is enough to use Theorem 7 to replace 
the normalization with ^ by the normalization with the standard deviation 
of To this end, note that 


1 

0-? 


Bri-B) 


?+ -- 


cr^ i?r(S) 


?• 


(5.28) 


Under the assumptions < 1 and ||S||oo gr, we get from Theorem 7 that 


Br{T.) 


crj - 1 


< 


r(E) 


Br{'S)y/n n 


Without loss of generality, we can and do assume that „ ^ i < c for a 

Jlj f [2-1 ) y/Ti 71 

small enough constant c > 0 so that 
of the theorem is trivial). Then 


1 


crj Sr(S) 


< 


Bri^) 


- 1 


1^1 


< 


<1/2 (otherwise, the bound 
r(E) 


1 


(jj ^ \Br{T,)y/ri nj Br{T,) 


-1^1- 


Combining this with bound of Theorem 4, we get that with probability at least 
1 - e-‘ 


1 


fJj Br{T,) 


< 


r(E) 


f<V 


t 


BrCS) * Br{T,)y/n 


^Br{T,)y/n n 

Using the last bound with t defined by (5.25), we easily get that 

r(E) ' 




(TJ BrCS) 


< 


Br{Ti)y/n 


/log 


Br{T,)y/n 

r(S) 


V 2 


logn 

y/n 


(5.29) 
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The result now follows from (5.24), (5.28) and (5.29) by proving bounds on 
similar to (5.22), (5.23). 

□ 


6. Concluding remarks 


1. We start this section with deducing from the non-asymptotic bound of The¬ 
orem 6 an asymptotic normality result. To this end, consider a sequence of 
problems in which the data is sampled from Gaussian distributions in H with 
mean zero and covariance S = Let X = be a centered Gaus¬ 

sian random vector in IH with covariance operator E = E^"^ and let Xi = 
x["\...,Xn = xi"^ be i.i.d. copies of X^^\ The sample covariance based 
on (xf\...,xi’^j is denoted by E„. Let (t(E("^) be the spectrum of E^"^ 
r > 1 be distinct nonzero eigenvalues of E^"^ arranged in decreasing or¬ 
der and Pr^\r > 1 be the corresponding spectral projectors. As before, denote 
:= {j : crj(E("^) = and let Pr^'^ be the orthogonal projector on the 

direct sum of eigenspaces corresponding to the eigenvalues {(Tj(E„),_) G A^"^}. 

Suppose that the spectral projector of E^"^ to be estimated is = Prf, 
the corresponding eigenvalue is , its multiplicity is and 

its spectral gap is ■ Denote 

P„ := Pr„(E(”)) := 2^||C(”^E(”)C'(’")||2||P^”^E(")p(”)||2. 

The following assumption on E^") will be needed: 

Assumption 1. Suppose the following conditions hold: 

supm^") < - 1-00 and sup— _. ( < -foo; (6.1) 

n>l n>l g^^ 


r (E^"^) 

Bn —t oo and-^ n qc .r, ^ ^ fe 9\ 

Bn 

Note that Assumption 1 implies that r(E^”) ) —>■ oo and r(E^”^) = o{n) as n —>• 
oo. This easily follows from 

Bn<2^/2^/^)(^^^^^ ^r(E(")) = o(^r(EG))) 
and (6.2). It is also easy to see that, under mild further assumptions, P„ 

||eW||2. 

Corollary 1. Suppose Assumption 1 holds. Then 


y/n 


—r \j as n 


VU.z,; 


Var(||p(”) -P(”)||2) 
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and the sequences of random variables 


0 _ p(")||2 _E||p(n) _p(™)||2^ 

Br, 


n>l 


(6.3) 


and 

f (^||p(") - pW||2 _ 'I 

^ Y^Var(||P(") -P(")||2) i n>i 

both converge in distribution to the standard normal random variable. 

2. Neither normal approximation bounds of Theorem 6, nor the asymptotic 
normality result of Corollary 1 could be directly used to construct confidence 
regions for spectral projectors of covariance operators or to develop hypotheses 
tests. The reason is that, in these results, the squared Hilbert-Schmidt norm 

||p(n)_p(«)||2 

is centered with its expectation and normalized with its standard 
deviation (or, alternatively, with -g-jsj') depend on unknown covariance 
operator E. It would be of interest to develop “data-driven” versions of these 
results, but this problem seems to be challenging and goes beyond the scope 
of the current paper. At the moment, we have only a partial solution (that is 
far from being perfect) of this problem in the case when the target spectral 
projector is one-dimensional (that is, the eigenvalue is of multiplicity 
one). We briefly outline such a result below. Assume that we are given a sample 
of size 3n of i.i.d. centered Gaussian vectors 



with common covariance operator . For each of the three subsamples of size 
n, define its sample covariance operator: 


E(") = - V 


e(") = 


1 

e(") = 


Let be the orthogonal projector onto the eigenspace associated with the 
eigenvalue of E*^”^ (which is of multiplicity one with a high probability). 
Similarly, p(^^ and are the orthogonal projectors onto the eigenspaces 
associated with the eigenvalue of E^”^ and the eigenvalue of E*^”\ 
respectively. Denote 


S(") = 1 and = ^(pW,p(")) 


- 1 . 


It turns out that the statistic — can be used as an estimator of the 


pectation E||P(") — P*'"^||| while the statistic (1-1- 6^"^)^ — (1 -|- &("))^ 
used to estimate the standard deviation Var^/^(||P(") — P^"^||i) (note that 


ex¬ 
can be 
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was introduced and studied in [10] as an estimator of a “bias parameter” of 
empirical spectral projectors and empirical eigenvectors). Moreover, it can be 
proved that, under Assumption 1, the sequence 


||p(n) _p(«)||2p2S(”) 

(1 +S ("))2 - (1 + 6 (ri ))2 



(6.4) 


converges in distribution to a Cauchy type random variable. 

For the spiked covariance model (1.1) with m, s^,..., s^, being fixed and 
P = Pri —i" c» as n —?► oo, it is easy to find a simpler version of data-driven 
normalization with the limit distribution being standard normal. For simplicity, 
assume that m = 1, so, the goal is to estimate the first principal components 
9i. Recall that in this case Bn = i?i(E^"'^) = ^^(^ 1 +'^^)°' Vp-i j-ggg (5.5)). 


r- _ 

Thus, the following estimator of Bn could be used: = 2v2— & — 1, 

(Ai -A2 ) 

where /i" and are the largest and the second largest eigenvalues of = 
i ®xf'\ respectively. In the case of such a spiked covariance model, 

Assumption 1 is equivalent to p = pn —00 and p = o(n). Under these assump¬ 
tions, it is easy to prove that -^ = -ff (1 + op(l)). 

Let Pi = 9i®9i. Then, it can be proved that the sequence 



(6.5) 


converges in distribution to a standard normal random variable. 

3. To illustrate the asymptotic behavior of standard PCA, we consider the 
following spiked covariance setting. Let Ai,..., A„, Ai,..., A„, Ai,..., A„ be 
3n i.i.d. random vectors in R*’ with covariance E = sf (0i 0 9i) -|- a^Ip, sf = 2, 
cr^ = 1/10, where 9i is an arbitrary unit vector in For selected values 
of {n,p), we computed the statistic — R 1 II 21 Bn and the empirical bias 

estimators Uf as well as the statistics (6.4) and 

+ ( 6 . 6 ) 


We performed 1000 replications of this experiment. 

In Table 1 , we compare the sample mean of the statistic 11— Pi 11 2 denoted by 
rhn (that provides an estimator of the risk ~ Pi 111 based on the repeated 

samples of size n) to the estimated risk for each individual sample and 

the first order approximation of the theoretical risk derived in (1.3) which can 

be computed easily in this model since A„ := Ai(E) = 2(p — 1). More 


precisely, in the second row of the table the sample means of 


|2S('‘^-|-7h„| 

|m™| 


over 
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1000 replications of the experiment are presented. The results show that 
provides a somewhat better approximation of the risk — Pi\% than the 

hrst order approximation (1.3) for small sample size. For relatively large sample 
size, the first order approximation (1.3) becomes more precise than the estimator 


n 

100 

200 

300 

500 

10® 

lO'^ 

l^n/n — mn \ 

\m„\ 

0.49 

0.24 

0.15 

0.1 

0.049 

0.008 

+ 771^1 

0.07 

0.06 

0.054 

0.052 

0.045 

0.036 


Table 1 

Relative deviation of the risk approximation and the risk estimator —from the 

sample risk mn for p = 10 ^. 

In Table 2, we compare the sample variance of the statistic WPf'^ — Pi\W denoted 
by S'^ to the variance estimator Vn := ^(1 + &")^ — (1 + and also to the 

first order approximation of the theoretical variance derived in (1.4) with 
Bn = — I Again, in the second row of the table the sample 

I Y —I 

means of ' "ao " O’^^r 1000 replications of the experiment are presented. We 

— q2 

observe that Vn and provide reasonable approximation of the variance of 
||p^(«’) _ Pj^||2 Qjjjy fQj. relatively large sample sizes. 


n 

100 

200 

300 

500 

10^ 

10^ 

s?, 

0.62 

0.65 

0.66 

0.58 

0.42 

0.07 

pvTsJi 

__ 

0.82 

0.73 

0.67 

0.58 

0.39 

0.05 


Table 2 


Relative deviation of the variance estimator Vn and the variance approximation from 
the sample variance for p = 10 ^. 


Finally, we compute empirical densities of the statistics (6.4) and (6.6) and 
compare them with their respective theoretical limiting distributions in Figure 
1. For (6.6), we also provide the empirical mean and variance. 
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mean = —0.03, variance = 1.61 mean = —0.02, variance = 1.53 mean = —0.01, variance = 1.14 mean = —0.02, variance = 1.008 

0.4 ■ 

0.3 
0.2 
0.1 
0 ^ 







J 

L 

5 0 5 

n=5000 

1 

\ 


n = 20000 



Figure 1. Top: empirical distribution of (6,6) and standard normal density for p = 1000. 
Bottom: empirical distribution and theoretical Cauchy distribution of (6.4) for p = 1000. 
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